Automatic speaker age and gender recognition using acoustic and prosodic level information fusion

نویسندگان

  • Ming Li
  • Kyu Jeong Han
  • Shrikanth S. Narayanan
چکیده

The paper presents a novel automatic speaker age and gender identification approach which combines seven different methods t both acoustic and prosodic levels to improve the baseline performance. The three baseline subsystems are (1) Gaussian mixture odel (GMM) based on mel-frequency cepstral coefficient (MFCC) features, (2) Support vector machine (SVM) based on GMM ean supervectors and (3) SVM based on 450-dimensional utterance level features including acoustic, prosodic and voice quality nformation. In addition, we propose four subsystems: (1) SVM based on UBM weight posterior probability supervectors using the hattacharyya probability product kernel, (2) Sparse representation based on UBM weight posterior probability supervectors, (3) VM based on GMM maximum likelihood linear regression (MLLR) matrix supervectors and (4) SVM based on the polynomial xpansion coefficients of the syllable level prosodic feature contours in voiced speech segments. Contours of pitch, time domain nergy, frequency domain harmonic structure energy and formant for each syllable (segmented using energy information in the voiced peech segment) are considered for analysis in subsystem (4). The proposed four subsystems have been demonstrated to be effective nd able to achieve competitive results in classifying different age and gender groups. To further improve the overall classification erformance, weighted summation based fusion of these seven subsystems at the score level is demonstrated. Experiment results re reported on the development and test set of the 2010 Interspeech Paralinguistic Challenge aGender database. Compared to the VM baseline system (3), which is the baseline system suggested by the challenge committee, the proposed fusion system achieves .6% absolute improvement in unweighted accuracy for the age task and 4.2% for the gender task on the development set. On the nal test set, we obtain 3.1% and 3.8% absolute improvement, respectively. 2012 Elsevier Ltd. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Combining five acoustic level modeling methods for automatic speaker age and gender recognition

This paper presents a novel automatic speaker age and gender identification approach which combines five different methods at the acoustic level to improve the baseline performance. The five subsystems are (1) Gaussian mixture model (GMM) system based on mel-frequency cepstral coefficient (MFCC) features, (2) Support vector machine (SVM) based on GMM mean supervectors, (3) SVM based on GMM maxi...

متن کامل

Demographic recommendation by means of group profile elicitation using speaker age and gender recognition

In this paper we show a new method of using automatic age and gender recognition to recommend a sequence of multimedia items to a home TV audience comprising multiple viewers. Instead of relying on explicitly provided demographic data for each user, we define an audio-based demographic group profile that captures the age and gender for all members of the audience. A 7-class age and gender class...

متن کامل

A Comparative Study of Gender and Age Classification in Speech Signals

Accurate gender classification is useful in speech and speaker recognition as well as speech emotion classification, because a better performance has been reported when separate acoustic models are employed for males and females. Gender classification is also apparent in face recognition, video summarization, human-robot interaction, etc. Although gender classification is rather mature in a...

متن کامل

On the use of high-level information in speaker and language recognition

Automatic Speaker Recognition systems have been largely dominated by acoustic-spectral based systems, relying in proper modelling of the short-term vocal tract of speakers. However, there is scientific and intuitive evidence that speaker specific information is embedded in the speech signal in multiple shortand long-term characteristics. In this work, a multilevel speaker recognition system com...

متن کامل

Automatic discrimination between laughter and speech

Emotions can be recognized by audible paralinguistic cues in speech. By detecting these paralinguistic cues that can consist of laughter, a trembling voice, coughs, changes in the intonation contour etc., information about the speaker’s state and emotion can be revealed. This paper describes the development of a gender-independent laugh detector with the aim to enable automatic emotion recognit...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computer Speech & Language

دوره 27  شماره 

صفحات  -

تاریخ انتشار 2013